Handbook of Big Data by unknow

Handbook of Big Data by unknow

Author:unknow
Language: eng
Format: epub
Published: 2016-01-21T11:00:08+00:00


208

Handbook of Big Data

12.6.5

Complexities with Power-Law Graphs

Power-law, or highly skewed, degree distribution pose problems for efficient implementations

of generalized matrix-vector products at scale. The PowerGraph extension of GraphLab [22]

contains a number of ideas to improve performance with formal reductions and power-law

graphs based on vertex splitting ideas that create virtual copies of vertices with lower degree.

These large degree vertices, however, tend not to prohibit implementations of these ideas

on large graphs and only make them slower.

12.6.6

Implementations in SQL

Generalized matrix-vector products are possible to implement even in traditional database

systems that implement SQL [74]. Using a distributed SQL database such as Greenplum will

evaluate these tasks with reasonable efficiency even for large graphs. As an example, here we

illustrate how to perform the generalized matrix-vector product for connected components

in SQL. The graph is stored as a table that provides edge-list access. ∗ The columns head

and tail indicate the start and end of each edge. The initial vector is stored as a table with

a vertex id and the unique id associated with it (which could be the same).

edges : id | head | tail

x : id | comp

In the iteration, we create the vector x(next) from x:

CREATE TABLE xnext AS (

SELECT e.tail AS id, MIN(x.comp) AS comp

FROM edges e INNER JOIN x ON e.head = x.id

GROUP BY e.tail );

This query takes the graph structure, joins it with the vector such that each component of

the table x is mapped to the head of each edge. Then we group them by the tail of each

edge and take the MIN function over all components. This is exactly what the iteration in

the connected components example did.

12.7

Limitations and Tradeoffs in Mining Large Graph

In many cases, individuals who employ graph mining tools want to obtain some sort of

qualitative understanding of or insight into their data. This soft and subtle goal differs in

important ways from simply using the graph to obtain better prediction accuracy in some

well-defined downstream machine learning task. In particular, it can differ from the use of

the algorithms and techniques we have focused on in the previous sections, for example,

when those algorithms are used as black-box components in larger analytics frameworks such

as various machine learning pipelines that are increasingly common. A common temptation

in this setting is to use the intuition obtained from mining small graph, assuming or hoping

that the intuition thereby obtained is relevant or useful for much larger graphs. In general,

it is not. Using intuition obtained from small graphs can lead to qualitatively incorrect

understanding of the behavior and properties of larger graphs; it can lead to qualitatively

incorrect understanding of the behavior of algorithms that run on larger graphs.

∗ We assume that the graph structure is undirected, so both edges ( u, v) and ( v, u) are stored, and that

each vertex has a self-loop.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.